Entity Resolution and Tracking on Social Networks a Dissertation Submitted to the Department of Computer Science and the Committee on Graduate Studies of Stanford University in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy
نویسندگان
چکیده
In this thesis we study two interesting aspects of the problem of Entity Resolution (ER). The goal of ER is to identify and merge records that refer to the same underlying entity. The recent rise in adoption of social networks (Facebook, Google+, Twitter, and others) introduces new issues and twists to the traditional ER problem: crowdsourcing and limited information. We first study a hybrid human-machine approach to solving ER problems. Machine learning models can predict the probabilities of entity pairs referring to the same entity. However, machines make mistakes. Humans can help verify the equality of entity pairs, and social systems like Facebook allow users to help resolve entities on their platforms. We propose hybrid human-machine strategies with theoretical guarantees that leverage transitivity relations (e.g. a = c can be inferred given a = b and b = c). Next, we study the problem of ER with limited information. Social systems impose limits on API calls that constrain access to their full social graphs. We focus on the resolution of a single node g from one social graph G against a second social graph T . We want to find the best match for g in T , by dynamically probing T (using a public API), limited by the number of API calls that these social systems allow. We propose two ER strategies that are designed for limited information and can be adapted to different API limits. Finally, we study the problem of updating social graph snapshots when one has limited information. Effective social network ER requires up-to-date snapshots. Limited by the number of API calls that social systems allow, we seek to efficiently update a snapshot. We want to avoid re-crawling all of the nodes and minimize the number of API calls. We propose novel snapshot update strategies that are designed for limited information and can be adapted to different levels of staleness.
منابع مشابه
Fluid Interaction for High Resolution Wall-size Displays a Dissertation Submitted to the Department of Computer Science and the Committee on Graduate Studies of Stanford University in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy
iv
متن کاملGaze-enhanced User Interface Design a Dissertation Submitted to the Department of Computer Science and the Committee on Graduate Studies of Stanford University in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy
........................................................................................................ iv Acknowledgments ..................................................................................... vi
متن کاملStructuring Peer Interactions for Massive Scale Learning a Dissertation Submitted to the Department of Computer Science and the Committee on Graduate Studies of Stanford University in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy
....................................................................................................................... iv Acknowledgments ........................................................................................................ vi Table of
متن کاملIncorporating Uncertainty in Data Management and Integration a Dissertation Submitted to the Department of Computer Science and the Committee on Graduate Studies of Stanford University in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy
متن کامل
Simulation-based Search for Hybrid System Control and Analysis a Dissertation Submitted to the Department of Computer Science and the Committee on Graduate Studies of Stanford University in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy
متن کامل
Haptics and Physical Simulation for Virtual Bone Surgery a Dissertation Submitted to the Department of Computer Science and the Committee on Graduate Studies of Stanford University in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy
......................................................................................................... iv Acknowledgments .......................................................................................... vi
متن کامل